Developer CD Series 1996 October: Mac OS SDK

home *** CD-ROM | disk | FTP | other *** search

/ Developer CD Series 1996 October: Mac OS SDK / Dev.CD Oct 96 SDK / Dev.CD Oct 96 SDK2.toast / Development Kits (Disc 2) / Speech Synthesis Manager / Documentation / The Speech Manager (text) < prev next >

Wrap

Text File | 1996-05-21 | 94.4 KB | 1,292 lines | [TEXT/MPS ]

The Speech Manager Speech Manager Overview 2 Speech Manager Concepts 3 Using the Speech Manager 4 Getting Started 4 Determining If the Speech Manager Is Available 4 Which Version of the Speech Manager Is Running? 5 Making Some Noise 5 Determining If Speaking Is Complete 6 A Simple Example 6 Essential Calls—Simple and Useful 7 Working With Voices 7 Managing Connections to Speech Synthesizers 11 Starting and Stopping Speech 13 Using Basic Speech Controls 14 Putting It All Together 17 Advanced Routines 18 Advanced Speech Controls 19 Converting Text Into Phonemes 23 Getting Information About a Speech Channel 24 Advanced Control Routines 30 Application-Defined Pronunciation Dictionaries 36 Associating a Dictionary With a Speech Channel 37 Pronunciation Dictionary Data Format 38 Creating and Editing Dictionaries 39 Advanced Voice Information Routines 39 Embedded Speech Commands 40 Embedded Speech Command Syntax 41 Embedded Speech Command Set 42 Embedded Speech Command Error Reporting 45 Summary of Phonemes and Prosodic Controls 45 Summary of the Speech Manager 49 This document describes the Apple® Speech Manager, which provides a standardized method for Macintosh® applications to generate synthesized speech. The document provides an overview of the Speech Manager followed by general information about generating speech from text. The necessary information and calls needed by all text-to-speech applications are given next, followed by a simple example of speech generation. More advanced calls and special-purpose routines are described last. Speech Manager Overview A complete system for speech synthesis consists of the elements shown in Figure 1-1. Figure 1-1 Speech synthesis components An application calls routines in the Speech Manager to convert character strings into speech and to adjust various parameters that affect the quality or character of the spoken output. The Speech Manager is responsible for dispatching these requests to a speech synthesizer. The speech synthesizer converts the text into sound and creates the actual audio output. The Apple-supplied voices, pronunciation dictionaries, and speech synthesizer may reside in a single file or in separate files. These files are clearly identifiable as Speech Manager–related files and are installed and removed by being dragged into or out of the System Folder. Additional voices can be provided by bundling the resources in the resource forks of specific applications. These resources are considered private to that particular application. It is up to the individual developers to decide whether the voice resources they provide are usable on a systemwide basis or only from within their applications. In the first release of the Speech Manager, pronunciation dictionaries are managed entirely by the application. The application is free to store dictionaries in either the resource or the data fork of a file. The application is responsible for loading the individual dictionaries into RAM and then passing a handle to the dictionary data to the Speech Manager. Applications that use the Speech Manager must provide their own human interface for selecting voices and/or controlling other speech characteristics. If voices are provided in separate files, the speech synthesizer developer is responsible for providing a method for installing these resources into the System Folder or Extensions folder. The computer must be rebooted after speech synthesizers are added to or removed from the System Folder for the desired changes to be recognized. Speech Manager Concepts On a simple level, speech synthesis from text input is a two-stage process. First, plain-language English text is converted into phonemic representations for the individual words. Phonemes stand for specific sounds; for a complete explanation, see “Summary of Phonemes and Prosodic Controls,” later in this document. The resulting sequence of phonemes is converted into audible sounds by mapping of the individual phonemes to a series of waveforms, which are sent to the sound hardware to be played. In reality, each stage is more complicated than this description suggests. For example, during the text-to-phoneme conversion stage, number strings, abbreviations, and special symbols must be detected and converted into appropriate words before being converted into phonemes. When a sentence such as “He earned over $2,000,000 in 1990” is spoken, it would normally be preferable to say “He earned over two million dollars in nineteen- ninety” rather than “He earned over dollar-sign, two, comma, zero, zero, zero, comma, zero, zero, zero, in one, nine, nine, zero.” To produce the desired spoken output automatically, knowledge of these sorts of constructions is built into the synthesizer. The phoneme-to-sound conversion stage is also complex. Phonemes by themselves are often not sufficient to describe the way a word should be pronounced. For example, the word “object” is pronounced differently depending on whether it is used as a noun or a verb. (When it is used as a noun, the stress is placed on the first syllable. As a verb, the stress is placed on the second syllable.) In addition to stress information, phonemes must often be augmented with pitch, duration, and other information to produce intelligible, natural-sounding speech. The speech synthesizer has many built-in rules for automatically converting text into the complex phonemic representation described above. However, there will always be words and phrases that are not pronounced the way you want. The Speech Manager allows you to provide raw phonemic information directly in order to enable very precise control over the spoken output. By default, speech synthesizers expect input in normal language text. However, using the input mode controls of the Speech Manager, you can tell the synthesizer to process input text in raw phonemic form. By using the embedded commands described in the next section, you can even mix normal language text with phonemic text within a single string or text buffer. See “Summary of Phonemes and Prosodic Controls,” later in this document, for a listing of the phonemic character set and each character’s interpretation. Using the Speech Manager This section describes the routines used to add speech synthesis features to an application. It is organized into three sections: “Getting Started” (Easy), “Essential Calls—Simple and Useful” (Intermediate), and “Advanced Routines.” Getting Started If you’re just getting started with text-to-speech conversion using the Speech Manager, the following routines will get you up and running with minimal effort. If you’re developing an application that does not need to choose voices, use more than one channel of speech, or exercise real-time control over the synthesized speech, these may be the only routines you need. Determining If the Speech Manager Is Available You can find out if the Speech Manager is available with a single call to the Gestalt Manager. Use the Gestalt toolbox routine and the selector gestaltSpeechAttr to determine whether or not the Speech Manager is available, as shown in Listing 1-1. If Gestalt returns noErr, then the parameter argument will contain a 32-bit value indicating one or more attributes of the installed Speech Manager. If the Speech Manager exists, the bit specified by gestaltSpeechMgrPresent is set. Listing 1-1 Determining if the Speech Manager is available Boolean SpeechAvailable (void) { OSErr err; long result; err = Gestalt(gestaltSpeechAttr, &result); if ((err != noErr) || !(result & (1 << gestaltSpeechMgrPresent))) return FALSE; else return TRUE; } Which Version of the Speech Manager Is Running? Once you have determined that the Speech Manager is installed, you can see which version of the Speech Manager is running by calling SpeechManagerVersion. SpeechManagerVersion Returns the version of the Speech Manager installed in the system. pascal NumVersion SpeechManagerVersion (void); DESCRIPTION SpeechManagerVersion returns the version of the Speech Manager installed in the system. This call should be used to determine the compatibility of your program with the currently installed Speech Manager. RESULT CODES None Making Some Noise The most basic operation of the Speech Manager is accomplished by using the SpeakString call. This call passes a specific text string to be spoken to the Speech Manager. SpeakString The SpeakString function passes a specific text string to be spoken to the Speech Manager. pascal OSErr SpeakString (StringPtr myString); Field descriptions myString Text string to be spoken DESCRIPTION SpeakString attempts to speak the Pascal-style text string contained in myString. Speech is produced asynchronously using the default system voice. When an application calls this function, the Speech Manager makes a copy of the passed string and creates any structures required to speak it. As soon as speaking has begun, control is returned to the application. The synthesized speech is generated transparently to the application so that normal processing can continue while the text is being spoken. No further interaction with the Speech Manager is required at this point, and the application is free to release or purge or trash the original string. If SpeakString is called while a prior string is still being spoken, the audio currently being synthesized is interrupted immediately. Conversion of the new text into speech is then initiated. If an empty (zero length) string or a null string pointer is passed to SpeakString, it stops the synthesis of any prior string but does not generate any additional speech. As with all Speech Manager routines that expect text arguments, the text may contain embedded speech control commands. Result CodesnoErr 0 No error memFullErr –108 Not enough memory to speak synthOpenFailed –241 Could not open another speech synthesizer channel Determining If Speaking Is Complete Once an application starts a speech process with SpeakString, the next thing it will probably need to know is whether the string has been completely spoken. It can use SpeechBusy to determine whether or not the system is still speaking. SpeechBusy The SpeechBusy routine is useful when you want to ensure that an earlier speech request has been completed before having the system speak again. pascal short SpeechBusy (void); DESCRIPTION SpeechBusy returns the number of channels of speech that are currently synthesizing speech in the application. If you use just SpeakString to initiate speech, SpeechBusy will always return 1 as long as speech is being produced. When SpeechBusy returns 0, all initiated speech has finished. RESULT CODES None A Simple Example The example shown in Listing 1-2 demonstrates how to use the routines introduced in this section. It first makes sure the Speech Manager is available. Then it starts speaking a string (hard-coded in this example, but more commonly loaded from a resource) and loops, doing some screen drawing, until the string is completely spoken. This example uses the SpeechAvailable routine shown in Listing 1-1. Listing 1-2 Elementary Speech Manager calls OSErr err; if (SpeechAvailable()) { err = SpeakString("\pThe cat sat on the mat."); if (err == noErr) while (SpeechBusy() > 0) CoolAnimationRoutine(); else NotSoCoolAlertRoutine(err); } Essential Calls—Simple and Useful While the routines presented in the last section are simple to use, their applicability is limited to a few basic speech scenarios. This section describes additional routines that let you work with different voices and adjust some basic characteristics of the synthesized speech. Working With Voices When describing a person’s voice, we talk about the particular set of characteristics that help us to distinguish that person’s voice from another. For example, the rate at which one speaks (slow or fast) and the average pitch (high or low) characterize a particular speaker on a crude level. In the context of the Speech Manager, a voice is the set of parameters that specify a particular quality of synthesized speech. This portion of the Speech Manager is used to determine which voices are available and to select particular voices. Every specific voice has a unique ID associated with it, which is the primary way an application refers to it. Within the Speech Manager a unique voice ID is called a VoiceSpec structure. The Speech Manager provides two routines to count and step through the list of currently available voices. CountVoices is used to compute how many voices are available with the current system. GetIndVoice uses an index, starting at 1, to return information about all currently installed voices. Use the GetIndVoice routine to step through the list of available voices. It will fill a VoiceSpec record that can be used to obtain descriptive information about the voice or to speak using that voice. Any application that wishes to use multiple voices will probably need additional information about the available voices beyond the VoiceSpec structure, such as the name of the voice and perhaps what script and language each voice supports. This information might be presented to the user in a “voice picker” dialog box or voice menu, or it might be used internally by an application trying to find a voice that meets certain criteria. Applications can use the GetVoiceDescription routine for these purposes. MakeVoiceSpec To maximize compatibility with future versions of the Speech Manager, you should always use MakeVoiceSpec instead of setting the fields of the VoiceSpec structure directly. pascal OSErr MakeVoiceSpec (OSType creator, OSType id, VoiceSpec *voice); typedef struct VoiceSpec { OSType creator; // determines which synthesizer is required OSType id; // voice ID on the specified synth } VoiceSpec; Field descriptions creator The synthesizer required by your application id Identification number for this voice *voice Pointer to the VoiceSpec structure DESCRIPTION Most voice management routines expect to be passed a pointer to a VoiceSpec structure. MakeVoiceSpec is a utility routine provided to facilitate the creation of VoiceSpec records. On return, the passed VoiceSpec structure contains the appropriate values. Voices are stored in resources of type 'ttsv' in the resource fork of Macintosh files. The Speech Manager uses the same search method as the Resource Manager, looking for voice resources in three different locations when attempting to resolve VoiceSpec references. It first looks in the application’s resource file chain. If the specified voice is not found in an open file, it then looks in the System Folder and the Extensions folder (or in just the System Folder under System 6) for files of type 'ttsv' (single-voice files) or 'ttsb' (multivoice files) and in text-to-speech synthesizer component files (file type 'INIT' or 'thng'). Voices stored in the System Folder or Extensions folder are normally available to all applications. Voices stored in the resource fork of an application files are private to the application. RESULT CODEnoErr 0 No error While the determination of specific voice ID values is mostly left to synthesizer developers, the voice creator values are specified by Apple (they would ordinarily correspond to a developer’s currently assigned creator ID). For both the creator and id fields Apple further reserves the set of OSType values specified entirely by space characters and lowercase letters. Apple is establishing a standard suite of voice ID values that developers can count upon being available with all speech synthesizers. CountVoices The CountVoices routine returns the number of voices available. pascal OSErr CountVoices (short *voiceCount); Field descriptions voiceCount Number of voices available to the application DESCRIPTION Each time CountVoices is called, the Speech Manager searches for new voices. This algorithm supports dynamic installation of voices by applications or users. On return, the voiceCount parameter contains the number of voices available. RESULT CODESnoErr 0 No error GetIndVoice The GetIndVoice routine returns information about a specific voice. pascal OSErr GetIndVoice (short index, VoiceSpec *voice); Field descriptions index Index value for a specific voice *voice Pointer to the VoiceSpec structure DESCRIPTION As with all other index-based routines in the Macintosh Toolbox, an index value of 1 causes GetIndVoice to return information for the first voice. The order that voices are returned is not presently defined and should not be assumed. Speech Manager behavior when voice files or resources are added, removed, or modified is also presently undefined. However, calling CountVoices or GetIndVoice with an index of 1 will force the Speech Manager to update its list of available voices. GetIndVoice will return a voiceNotFound error if the passed index value exceeds the number of available voices. RESULT CODESnoErr 0 No error voiceNotFound –244 Voice resource not found GetVoiceDescription The GetVoiceDescription routine returns information about a voice beyond that provided by GetIndVoice. pascal OSErr GetVoiceDescription (VoiceSpec *voice, VoiceDescription *info, long infoLength); enum {kNeuter = 0, kMale, kFemale}; // returned in gender field below typedef struct VoiceDescription { long length; // size of structure VoiceSpec voice; // synth and ID info for voice long version; // version code for voice Str63 name; // name of voice Str255 comment; // additional text info about voice short gender; // neuter, male, or female short age; // approximate age in years short script; // script code of text voice can process short language; // language code of voice output speech short region; // region code of voice output speech long reserved[4]; // always zero - reserved } VoiceDescription; Field descriptions *voice Pointer to the VoiceSpec structure *info Pointer to structure containing parameters for the specified voice infoLength Length in bytes of info structure DESCRIPTION The Speech Manager fills out the passed VoiceDescription fields with the correct information for the specified voice. If a null VoiceSpec pointer is passed, the Speech Manager returns information for the system default voice. If the VoiceDescription pointer is null, the Speech Manager simply verifies that the specified VoiceSpec refers to an available voice. If VoiceSpec does not refer to a known voice, GetVoiceDescription returns a voiceNotFound error, as shown in Listing 1-3. To maximize compatibility with future versions of the Speech Manager, the application must pass the size of the VoiceDescription structure. Having the application do this ensures that the Speech Manager will never write more data into the passed structure than will fit even if additional information fields are defined in the future. On returning from GetVoiceDescription, the length field is set to reflect the length of data actually written by this routine. Listing 1-3 Getting information about a voice OSErr GetVoiceGender (VoiceSpec *voicePtr, short *gender) { OSErr err; VoiceDescription vd; err = GetVoiceDescription (voicePtr,&vd,sizeof(VoiceDescription)); if (err == noErr) { if (vd.length > offsetof(VoiceDescription,gender)) *gender = vd.gender; else err = badStructLen; } return err; } RESULT CODESnoErr 0 No error paramErr –50 Parameter error memFullErr –108 Not enough memory to load voice into memory voiceNotFound –244 Voice resource not found Managing Connections to Speech Synthesizers Using the routines described earlier in this document, an application can select the voice with which to speak. The next step is to associate the selected voice with the proper speech synthesizer. This is accomplished by creating a new speech channel with the NewSpeechChannel routine. A speech channel is a private communication connection to the speech synthesizer, much as a file reference number is a communication channel to an open file in the Macintosh file system. The DisposeSpeechChannel routine closes a speech channel when the application is finished with it and releases any resources that have been allocated to support the speech synthesizer and are no longer needed. NewSpeechChannel The NewSpeechChannel routine creates a new speech channel. pascal OSErr NewSpeechChannel (VoiceSpec *voice, SpeechChannel *chan); Field descriptions *voice Pointer to the VoiceSpec structure *chan Pointer to the new channel DESCRIPTION The Speech Manager automatically locates and opens a connection to the proper synthesizer for a specified voice and sets up a channel at the location pointed to by *chan so that it is ready to speak with that voice. If a null VoiceSpec pointer is passed to NewSpeechChannel, the Speech Manager uses the current system default voice. There is no predefined limit to the number of speech channels an application may create. However, system constraints on available RAM, processor loading, and number of available sound channels may limit the number of speech channels actually possible. RESULT CODESnoErr 0 No error memFullErr –108 Not enough memory to open speech channel synthOpenFailed –241 Could not open another speech synthesizer channel voiceNotFound –244 Voice resource not found DisposeSpeechChannel The DisposeSpeechChannel routine disposes of an existing speech channel. pascal OSErr DisposeSpeechChannel (SpeechChannel chan); Field descriptions chan Specific speech channel DESCRIPTION This routine disposes of an existing speech channel. Any speech channels that have not been explicitly disposed of by the application are released automatically by the Speech Manager when the application quits. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter Starting and Stopping Speech All the remaining routines in this section require a valid speech channel to work properly. Once the application has successfully created a speech channel, it can start to speak. You use the SpeakText routine to begin speaking on a speech channel. At any time during the speaking process, the application can stop the synthesizer’s speech. The StopSpeech routine will immediately abort any speech being produced on the specified speech channel and force the channel back into an idle state. SpeakText The SpeakText routine converts a designated text into speech. pascal OSErr SpeakText (SpeechChannel chan, Ptr textBuf, long byteLength); Field descriptions chan Specific speech channel textBuf Buffer of text byteLength Length of textBuf DESCRIPTION In addition to a valid speech channel, SpeakText expects a pointer to the text to be spoken and the length in bytes of the text buffer. SpeakText will convert the given text stream into speech using the voice and control settings for that speech channel. The speech is generated asynchronously. This means that control is returned to your application before the speech has finished (probably even before it has begun). The maximum length of text buffer that can be spoken is limited only by the available RAM. However, it’s generally not very friendly to force the user to listen to long uninterrupted text unless the user requests it. If SpeakText is called while it is currently busy speaking the contents of a prior text buffer, it will immediately stop speaking from the prior buffer and will begin speaking from the new text buffer as soon as possible. As with SpeakString, described on page 5, if an empty (zero length) string or a null text buffer pointer is passed to SpeakText, this will have the effect of stopping the synthesis of any prior text but will not generate any additional speech. sWARNING With SpeakText, unlike with SpeakString, the text buffer must be locked in memory and must not move during the entire time that it is being converted into speech. This buffer is read at interrupt time, and very undesirable effects will happen if it moves or is purged from memory.s RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter StopSpeech The StopSpeech routine terminates speech delivery on a specified channel. pascal OSErr StopSpeech (SpeechChannel chan); Field descriptions chan Specific speech channel DESCRIPTION After returning from StopSpeech, the application can safely release any text buffer that that the speech synthesizer has been using. The SpeechBusy routine, described on page 6, can be used to determine if the text has been completely spoken. (In an environment with multiple speech channels, you may need to use the more advanced status routine GetSpeechInfo, described on page 25, to determine if a specific channel is still speaking.) StopSpeech can be called for an already idle channel without ill effect. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter Using Basic Speech Controls The Speech Manager provides several methods of adjusting the variables that can affect the way speech is synthesized. Although most applications probably do not need to use these advanced features, two of the speech variables, speaking rate and speaking pitch, are useful enough that a very simple way of adjusting these parameters on a channel-by-channel basis is provided. Routines are supplied that enable an application to both set and get these parameters. However, the audible effects of changing the rate and pitch of speech vary from synthesizer to synthesizer; you should test the actual results on all synthesizers with which your application may work. Speaking rates are specified in terms of words per minute (WPM). While this unit of measurement is difficult to define in any precise way, it is generally easy to understand and use. The range of supported rates is not predefined by the Speech Manager. Each speech synthesizer provides its own range of speaking rates. Furthermore, any specific rate value will correspond to slightly different rates with different synthesizers. Speaking pitches are defined on a musical scale that corresponds to the keys on a standard piano keyboard. By convention, pitches are represented as fixed-point values in the range from 0.000 through 100.000, where 60.000 corresponds to middle C (261.625 Hz) on a conventional piano. Pitches are represented on a logarithmic scale. On this scale, a change of +12 units corresponds to doubling the frequency, while a change of –12 units corresponds to halving the frequency. For a further discussion of pitch values, see “Getting Information About a Speech Channel,” later in this document. Typical voice frequencies might range from around 90 Hertz for a low-pitched male voice to perhaps 300 Hertz for a high-pitched child’s voice. These frequencies correspond to pitch values of 41.526 and 53.526, respectively. Changes in speech rate and pitch are effective immediately (as soon as the synthesizer can respond), even if they occur in the middle of a word. SetSpeechRate The SetSpeechRate routine sets the speaking rate on a designated speech channel. pascal OSErr SetSpeechRate (SpeechChannel chan, Fixed rate); Field descriptions chan Specific speech channel rate Word output speaking rate DESCRIPTION The SetSpeechRate routine is used to adjust the speaking rate on a speech channel. The rate parameter is specified as a fixed-point, words per minute value. As a general rule of thumb, “normal” speaking rates range from around 150 WPM to around 180 WPM. It is important when working with speaking rates, however, to keep in mind that users will differ greatly in their ability to understand synthesized speech at a particular rate based upon their level of experience listening to the voice and their ability to anticipate the types of utterances they will encounter. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter GetSpeechRate The GetSpeechRate routine returns the speech rate currently active on a designated speech channel. pascal OSErr GetSpeechRate (SpeechChannel chan, Fixed *rate); Field descriptions chan Specific speech channel *rate Pointer to the current speaking rate DESCRIPTION The GetSpeechRate routine is used to find out the speaking rate currently active on a speech channel. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter SetSpeechPitch The SetSpeechPitch routine sets the speaking pitch on a designated speech channel. pascal OSErr SetSpeechPitch (SpeechChannel chan, Fixed pitch); Field descriptions chan Specific speech channel pitch Frequency of voice DESCRIPTION Use the SetSpeechPitch routine to change the current speaking pitch on a speech channel. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter GetSpeechPitch The GetSpeechPitch routine returns the current speaking pitch on a designated speech channel. pascal OSErr GetSpeechPitch (SpeechChannel chan, Fixed *pitch); Field descriptions chan Specific speech channel pitch Frequency of voice DESCRIPTION The GetSpeechPitch routine is used to find out the speaking pitch currently active on a speech channel. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter Putting It All Together The code fragment in Listing 1-4 illustrates many of the routines introduced in this section. The example steps through the list of available voices to find the first female voice. Then it creates a new speech channel and begins speaking. While the voice is speaking, the pitch of the voice is continually adjusted around the original pitch. If the mouse button is pressed while the voice is speaking, the code halts the speech and exits. This example uses the SpeechAvailable and GetVoiceGender routines shown earlier in Listing 1-1 and Listing 1-3. Listing 1-4 Putting it all together OSErr err; Str255 myStr = "\pThe bat sat on my hat."; VoiceSpec voice; VoiceDescription vd; Boolean gotVoice = FALSE; short voiceCount, gender, i; SpeechChannel chan; Fixed origPitch, newPitch; if (myStr[0] && SpeechAvailable()) { err = CountVoices(&voiceCount); // count the available voices i = 1; while ((i <= voiceCount) && ((err=GetIndVoice(i++, &voice)) ==noErr)) { err = GetIndVoice(i++, &voice)) == noErr; err = GetVoiceGender(&voice, &gender); if ((err == noErr) && (gender == kFemale)) { gotVoice = TRUE; break; } } if (gotVoice) { err = NewSpeechChannel(&voice, &chan); if (err == noErr) { err = GetSpeechPitch(chan, &origPitch); // cur pitch if (err == noErr) err = SpeakText(chan, &myStr[1], myStr[0]); i = 0; if (err == noErr) while (SpeechBusy() > 0) { CoolAnimationRoutine(); newPitch = (i - 4) << 16; // fixed pitch offset newPitch += origPitch; i = (i + 1) & 7; // steps from 0 to 7 repeatedly err = SetSpeechPitch(chan, newPitch); if ((err != noErr) || Button()) { err = StopSpeech(chan); break; } } err = DisposeSpeechChannel(chan); } } if (err != noErr) NotSoCoolAlertRoutine(err); Advanced Routines This section describes several advanced or rarely-used Speech Manager routines. You can use them to improve the quality of your application’s speech. Advanced Speech Controls The StopSpeech routine, described in “Starting and Stopping Speech,” earlier in this document, provides a simple way to interrupt any speech output instantly. In some situations it is preferable to be able to stop speech production at the next natural boundary, such as the next word or the end of the current sentence. StopSpeechAt provides that capability. Similarly, the PauseSpeechAt routine causes speech to pause at a specified point in the text being spoken; the ContinueSpeech routine resumes speech after it has paused. In addition to SpeakString and SpeakText, described earlier in this document, the Speech Manager provides a third, more general routine. SpeakBuffer is the low-level speech routine upon which the other two are built. SpeakBuffer provides greater control through the use of an additional flags parameter. The SpeechBusySystemWide routine tells you if any speech is currently being synthesized in your application or elsewhere on the computer. StopSpeechAt The StopSpeechAt routine halts speech at a specific point in the text being spoken. pascal OSErr StopSpeechAt (SpeechChannel chan, long whereToStop); enum { kImmediate = 0, kEndOfWord = 1, kEndOfSentence = 2 }; Field descriptions chan Specific speech channel whereToStop Location in text at which speech is to stop DESCRIPTION StopSpeechAt is used to halt the production of speech at a specified point in the text. The whereToStop argument should be set to one of the following constants: n The kImmediate constant stops speech output immediately. n The kEndOfWord constant lets speech continue until the current word has been spoken. n The kEndOfSentence constant lets speech continue until the end of the current sentence has been reached. This routine returns immediately, although speech output continues until the specified point has been reached. sWARNING You must not release the memory associated with the current text buffer until the channel status indicates that the speech channel output is no longer busy.s If the end of the input text buffer is reached before the specified stopping point, the speech synthesizer will stop at this point. Once the stopping point has been reached, the application is free to release the text buffer. Calling StopSpeechAt with whereToStop equal to kImmediate is equivalent to calling StopSpeech, described on page 14. Contrast the StopSpeechAt routine with PauseSpeech, described next. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter PauseSpeechAt The PauseSpeechAt routine causes speech to pause at a specified point in the text being spoken. pascal OSErr PauseSpeechAt (SpeechChannel chan, long whereToPause); enum { kImmediate = 0, kEndOfWord = 1, kEndOfSentence = 2 }; Field descriptions chan Specific speech channel whereToPause Location in text at which speech is to pause DESCRIPTION PauseSpeech makes speech production pause at a specified point in the text. The whereToPause parameter should be set to one of these constants: n The kImmediate constant stops speech output immediately. n The kEndOfWord constant lets speech continue until the current word has been spoken. n The kEndOfSentence constant lets speech continue until the end of the current sentence has been reached. When the specified point is reached, the speech channel enters the paused state, reflected in the channel’s status. PauseSpeechAt returns immediately, although speech output will continue until the specified point. If the end of the input text buffer is reached before the specified pause point, speech output pauses at the end of the buffer. PauseSpeechAt differs from StopSpeech and StopSpeechAt in that a subsequent call to ContinueSpeech, described next, causes the contents of the current text buffer to continue being spoken. sWARNING While in a paused state, the last text buffer must remain available at all times and must not move. While paused, the SpeechChannel status indicates outputBusy = true and outputPaused = true.s RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter ContinueSpeech The ContinueSpeech routine resumes speech after it has been halted by the PauseSpeechAt routine. pascal OSErr ContinueSpeech (SpeechChannel chan); Field descriptions chan Specific speech channel DESCRIPTION At any time after PauseSpeechAt is called, ContinueSpeech may be called to continue speaking from the point at which speech paused. Calling ContinueSpeech on a channel that is not currently in a pause state has no effect; calling it before a pause is effective cancels the pause. RESULT CODESnoErr 0 No error invalidComponentID –3000 Invalid SpeechChannel parameter SpeakBuffer The SpeakBuffer routine causes the contents of a text buffer to be spoken, using certain flags to control speech behavior. pascal OSErr SpeakBuffer (SpeechChannel chan, Ptr textBuf, long byteLength, long controlFlags); enum { kNoEndingProsody = 1, kNoSpeechInterrupt = 2, kPreflightThenPause = 4 }; Field descriptions chan Specific speech channel textBuf Buffer of text byteLength Length of textBuf controlFlags Control flags to control speech behavior DESCRIPTION When the controlFlags parameter is set to 0, SpeakBuffer behaves identically to SpeakText, described on page 13. The kNoEndingProsody flag bit is used to control whether or not the speech synthesizer automatically applies ending prosody, the speech tone and cadence that normally occur at the end of a statement. Under normal circumstances (for example, when the flag bit is not set), ending prosody is applied to the speech when the end of the textBuf data is reached. This default behavior can be disabled by setting the kNoEndingProsody flag bit. Some synthesizers do not speak until the kNoEndingProsody flag bit is reset, or they encounter a period in the text, or textBuf is full. The kNoSpeechInterrupt flag bit is used to control the behavior of SpeakBuffer when called on a speech channel that is still busy. When the flag bit is not set, SpeakBuffer behaves similarly to SpeakString and SpeakText, described earlier in this document. Any speech currently being produced on the specified speech channel is immediately interrupted and then the new text buffer is spoken. When the kNoSpeechInterrupt flag bit is set, however, a request to speak on a channel that is still busy processing a prior text buffer will result in an error. The new buffer is ignored and the error synthNotReady is returned. If the prior text buffer has been fully processed, the new buffer is spoken normally. The kPreflightThenPause flag bit is used to minimize the latency experienced when attempting to speak. Ordinarily whenever a call to SpeakString, SpeakText, or SpeakBuffer is made, the speech synthesizer must perform a certain amount of initial processing before speech output is heard. This startup latency can vary from a few milliseconds to several seconds depending upon which speech synthesizer is being used. Recognizing that larger startup delays may be detrimental to certain applications, a mechanism is provided to provide the synthesizer a chance to perform any necessary computations at noncritical times. Once the computations have been completed, the speech is able to start instantly. When the kPreflightThenPause flag bit is set, the speech synthesizer will process the input text as necessary to the point where it is ready to begin producing speech output. At this point, the synthesizer will enter a paused state and return to the caller. When the application is ready to produce speech, it should call the ContinueSpeech routine to begin speaking. RESULT CODESnoErr 0 No error synthNotReady –242 Speech channel is still busy speaking invalidComponentID –3000 Invalid SpeechChannel parameter SpeechBusySystemWide You can use SpeechBusySystemWide to determine if any speech is currently being synthesized in your application or elsewhere on the computer. pascal short SpeechBusySystemWide (void); DESCRIPTION This routine is useful when you want to ensure that no speech is currently being produced anywhere on the Macintosh computer. SpeechBusySystemWide returns the total number of speech channels currently synthesizing speech on the computer, whether they were initiated by your code or by some other process executing concurrently. RESULT CODES None Converting Text Into Phonemes In some situations it is desirable to convert a text string into its equivalent phonemic representation. This may be useful during the content development process to fine-tune the pronunciation of particular words or phrases. By first converting the target phrase into phonemes, you can see what the synthesizer will try to speak. Then you need only correct the parts that would not have been spoken the way you want. TextToPhonemes The TextToPhonemes routine converts a designated text to phoneme codes. pascal OSErr TextToPhonemes (SpeechChannel chan, Ptr textBuf, long textBytes, Handle phonemeBuf, long *phonemeBytes); Field descriptions chan Specific speech channel textBuf Buffer of text textBytes Length of textBuf in bytes phonemeBuf Buffer of phonemes *phonemeBytes Pointer to length of phonemeBuf in bytes DESCRIPTION It may be useful to convert your text into phonemes during application development in order to be able to reduce the amount of memory required to speak. If your application does not require the text-to-phoneme conversion portion of the speech synthesizer, significantly less RAM may be required to speak with some synthesizers. Additionally, you may be able to use a higher quality text-to-phoneme conversion process (even one that does not work in real time) to generate precise phonemic information. This data can then be used with any speech synthesizer to produce better speech. TextToPhonemes accepts a valid SpeechChannel parameter, a pointer to the characters to be converted into phonemes, the length of the input text buffer in bytes, an application-supplied handle into which the converted phonemes can be written, and a length parameter. On return, the phonemeBytes argument is set to the number of phoneme character bytes that were written into phonemeBuf. The data returned by TextToPhonemes will correspond precisely to the phonemes that would be spoken had the input text been sent to SpeakText instead. All current mode settings are applied to the converted speech. No callbacks are generated while the TextToPhonemes routine is generating its output. RESULT CODESnoErr 0 No error paramErr –50 Parameter value is invalid nilHandleErr –109 Handle argument is nil siUnknownInfoType –231 Feature not implemented on synthesizer invalidComponentID –3000 Invalid SpeechChannel parameter Getting Information About a Speech Channel Several additional types of information have been made available for advanced users of the Speech Manager. This information provides more detailed status information for each channel. You can get this information by calling the GetSpeechInfo routine. This function accepts selectors that determine the type of information you want to get. Note Throughout this document, there are several references to parameter values specified with fixed-point integer values (pbas, pmod, rate, and volm). Unless otherwise stated, the full range of values of the Fixed data type is valid. However, it is left to the individual speech synthesizer implementation to determine whether or not to use the full resolution and range of the Fixed data type. In the event a specified parameter value lies outside the range supported by a particular synthesizer, the synthesizer will substitute the value closest to the specified value that does lie within its performance specifications.u GetSpeechInfo The GetSpeechInfo routine returns information about a designated speech channel. pascal OSErr GetSpeechInfo (SpeechChannel chan, OSType selector, void *speechInfo); enum { soStatus = 'stat', // gets speech output status soErrors = 'erro', // gets error status soInputMode = 'inpt', // gets current text/phon mode soCharacterMode = 'char', // gets current character mode soNumberMode = 'nmbr', // gets current number mode soRate = 'rate', // gets current speaking rate soPitchBase = 'pbas', // gets current baseline pitch soPitchMod = 'pmod', // gets current pitch modulation soVolume = 'volm', // gets current speaking volume soSynthType = 'vers', // gets speech synth version info soRecentSync = 'sync', // gets most recent sync message info soPhonemeSymbols = 'phsy', // gets phoneme symbols & ex. words soSynthExtension = 'xtnd' // gets synthesizer-specific info }; Field descriptions chan Specific speech channel selector Used to specify data being requested *speechInfo Pointer to an information structure DESCRIPTION The following list of selectors describes the various types of information that can be obtained from the Speech Manager. The format of the information returned depends on which value is used in the selector field, as follows: Note For future code compatibility, use the application programming interface (API) labels instead of literal selector values.u Field descriptions stat Gets various items of status information for the specified channel. Indicates whether any speech audio is being generated, whether or not the channel has paused, how many bytes in the input text have yet to be processed, and the phoneme code for the phoneme that is currently being generated. If inputBytesLeft is 0, the input buffer is no longer needed and can be disposed of. The API label for this selector is soStatus. typedef SpeechStatusInfo *speechInfo; typedef struct SpeechStatusInfo { Boolean outputBusy; // true = audio playing Boolean ouputPaused; // true = channel paused long inputBytesLeft; // bytes left to process short phonemeCode; // current phoneme code } SpeechStatusInfo; erro Gets saved error information and clears the error registers. This selector lets you poll for various run-time errors that occur during speaking, such as the detection of badly formed embedded commands. Errors returned directly by Speech Manager routines are not reported here. The count field shows how many errors have occurred since the last check. If count is 0 or 1, then oldest and newest will be the same. Otherwise, oldest contains the error code for the oldest unread error and newest contains the error code for the most recent error. Both oldPos and newPos contain the character positions of their respective errors in the original input text buffer. The API label for this selector is soErrors. typedef SpeechErrorInfo *speechInfo; typedef struct SpeechErrorInfo { short count; // # of errs since last check OSErr oldest; // oldest unread error long oldPos; // char position of oldest err OSErr newest; // most recent error long newPos; // char position of newest err } SpeechErrorInfo; inpt Gets the current value of the text processing mode control. The returned value specifies whether the specified speech channel is currently in text-input mode (TEXT) or phoneme-input mode (PHON). The API label for this selector is soInputMode. typedef OSType *speechInfo; // TEXT or PHON char Gets the current value of the character processing mode control. The returned value specifies whether the specified speech channel is currently processing input characters in normal mode (NORM) or in literal, letter-by-letter, mode (LTRL). The API label for this selector is soCharacterMode. typedef OSType *speechInfo; // NORM or LTRL nmbr Gets the current value of the number processing mode control. The returned value specifies whether the specified speech channel is currently processing input character digits in normal mode (NORM) or in literal, digit-by-digit, mode (LTRL). The API label for this selector is soNumberMode. typedef OSType *speechInfo; // NORM or LTRL rate Gets the current speaking rate in words per minute on the specified channel. Speaking rates are fixed-point values. The API label for this selector is soRate. typedef Fixed *speechInfo; Note Words per minute is a convenient, if difficult to define, way of representing speaking rate. Although there is no universally accepted definition of words per minute, it does communicate approximate information about speaking rates. Any specific rate may correspond to different rates on different synthesizers, but the two rates should be reasonably close. More importantly, doubling the rate on a particular synthesizer should halve the time needed to speak any particular utterance.u pbas Gets the current baseline pitch for the specified channel. The pitch value is a fixed-point integer that conforms to the following frequency relationship: Hertz = 440.0 * 2((BasePitch - 69) / 12) BasePitch of 1.0 ≈ 9 Hertz BasePitch of 39.5 ≈ 80 Hertz BasePitch of 45.8 ≈ 115 Hertz BasePitch of 50.4 ≈ 150 Hertz BasePitch of 100.0 ≈ 2637 Hertz BasePitch values are always positive numbers in the range from 1.0 through 100.0. The API label for this selector is soPitchBase. typedef Fixed *speechInfo; pmod Gets the current pitch modulation range for the speech channel. Modulation values range from 0.0 through 100.0. A value of 0.0 corresponds to no modulation and means the channel will speak in a monotone. The API label for this selector is soPitchMod. Nonzero modulation values correspond to pitch and frequency deviations according to the following formula: Maximum pitch = BasePitch + PitchMod Minimum pitch = BasePitch - PitchMod Maximum Hertz = BaseHertz * 2(+ ModValue / 12) Minimum Hertz = BaseHertz * 2(- ModValue / 12) Given: BasePitch of 46.0 (≈ 115 Hertz), PitchMod of 2.0, Then: Maximum pitch = 48.0 (≈131 Hertz), Minimum pitch = 44.0 (≈104 Hertz) typedef Fixed *speechInfo; volm Gets the current setting of the volume control on the specified channel. Volumes are expressed in fixed-point units ranging from 0.0 through 1.0. A value of 0.0 corresponds to silence, and a value of 1.0 corresponds to the maximum possible volume. Volume units lie on a scale that is linear with amplitude or voltage. A doubling of perceived loudness corresponds to a doubling of the volume. The API label for this selector is soVolume. typedef Fixed *speechInfo; vers Gets descriptive information for the type of speech synthesizer being used on the specified speech channel. The API label for this selector is soSynthType. typedef SpeechVersionInfo *speechInfo; typedef struct SpeechVersionInfo { OSType synthType; // always 'ttsc' OSType synthSubType; // synth flavor OSType synthManufacturer; // synth creator long synthFlags; // reserved NumVersion synthVersion; // synth version } SpeechVersionInfo; sync Returns the sync message code for the most recently encountered embedded sync command at the audio output point. If no sync command has been encountered, 0 is returned. Refer to the section “Embedded Speech Commands,” later in this document, for information about sync commands. The API label for this selector is soRecentSync. typedef OSType *speechInfo; phsy Returns a list of phoneme symbols and example words defined for the current synthesizer. The input parameter is the address of a handle variable. On return, the PhonemeDescriptor parameter contains a handle to the array of phoneme definitions. Make sure to dispose of the handle when you are done using it. This information is normally used to indicate to the user the approximate sounds corresponding to various phonemes—an important feature in international speech. The API label for this selector is soPhonemeSymbols. typedef PhonemeDescriptor ***speechInfo; // VAR Handle typedef struct PhonemeInfo { short opcode; // opcode for the phoneme Str15 phStr; // corresponding char string Str31 exampleStr; // word that shows use of // phoneme short hiliteStart; // part of example word // to be hilighted as in short hiliteEnd; // TextEdit selections } PhonemeInfo; typedef struct PhonemeDescriptor { short phonemeCount; // # of elements PhonemeInfo thePhonemes[1]; // element list } PhonemeDescriptor; xtnd This call supports a general method for extending the functionality of the Speech Manager. It is used to get synthesizer-specific information. The format of the returned data is determined by the specific synthesizer queried. The speechInfo argument should be a pointer to the proper data structure. If a particular synthCreator value is not recognized by the synthesizer, the command is ignored and the siUnknownInfoType code is returned. The API label for this selector is soSynthExtension. typedef SpeechXtndData *speechInfo; typedef struct SpeechXtndData { OSType synthCreator; // synth creator ID Byte synthData[2]; // data TBD by synth } SpeechXtndData; RESULT CODESnoErr 0 No error siUnknownInfoType –231 Feature is not implemented on synthesizer invalidComponentID –3000 Invalid SpeechChannel parameter Advanced Control Routines The Speech Manager provides numerous control features for sophisticated developers. These controls enable you to set various speaking parameters programmatically and provide a rich set of callback routines that can be used to notify applications of various conditions within the speaking process. They are extended by many speech synthesizers. These controls are accessed with the SetSpeechInfo routine. All calls to this routine expect a SpeechChannel parameter, a selector to indicate the desired function, and a pointer to some data. The format of this data depends on the particular selector and is documented in the following routine description. SetSpeechInfo The SetSpeechInfo routine sets information for a designated speech channel. pascal OSErr SetSpeechInfo (SpeechChannel chan, OSType selector, void *speechInfo); enum { // Sets the parameter: soInputMode = 'inpt', // current text/phon mode soCharacterMode = 'char', // current character mode soNumberMode = 'nmbr', // current number mode soRate = 'rate', // current speaking rate soPitchBase = 'pbas', // current baseline pitch soPitchMod = 'pmod', // current pitch modulation soVolume = 'volm', // current speaking volume soCurrentVoice = 'cvox', // current speaking voice soCommandDelimiter = 'dlim', // command delimiters soReset = 'rset', // re channel to default state soCurrentA5 = 'myA5', // app's A5 on callbacks soRefCon = 'refc', // reference constant soTextDoneCallBack = 'tdcb', // text done callback proc soSpeechDoneCallBack = 'sdcb', // end-of-speech callback proc soSyncCallBack = 'sycb', // sync command callback proc soErrorCallBack = 'ercb', // error callback proc soPhonemeCallBack = 'phcb', // phoneme callback proc soWordCallBack = 'wdcb', // word callback proc soSynthExtension = 'xtnd' // synthesizer-specific info }; Field descriptions chan Specific speech channel selector Used to specify data being requested *speechInfo Pointer to an information structure DESCRIPTION The following list of selectors outlines the controls available with the Speech Manager. The format of the information returned depends on which value is used in the selector field, as follows: Note The Speech Manager supports several callback features that can provide the sophisticated developer with a tight coupling to the speech synthesis process. However, these callbacks must be used carefully. Each is invoked from interrupt level. This means that you may not perform any operations that might cause memory to be allocated, purged, or moved. Although application global variables are also ordinarily not accessible at interrupt time, the soCurrentA5 myA5 selector described in the following text can be used to ask the Speech Manager to point register A5 at your application’s global variables prior to each callback. This makes it fairly painless to access global variables from your callback handlers. If this information worries you, don’t despair. Most information available through callbacks is also available through a GetSpeechInfo call. These calls are more friendly and do not come with the constraints imposed upon callback code. The only drawback is that if you do not poll the information you are interested in often enough, you may miss some of the changes in your speech channel’s status.u Field descriptions inpt Sets the current value of the text processing mode control. The passed value specifies whether the speech channel should be in text-input mode (TEXT) or phoneme-input mode (PHON). Input mode changes take effect as soon as possible; however, the precise latency is dependent upon the specific speech synthesizer. The API label for this selector is soInputMode. typedef OSType *speechInfo; // TEXT or PHON char Sets the current value of the character processing mode control. The passed value specifies whether the speech channel should be in normal character processing mode (NORM) or literal, letter-by-letter, mode (LTRL). Character mode changes take effect as soon as possible; however, the precise latency is dependent upon the specific speech synthesizer. The API label for this selector is soCharacterMode. typedef OSType *speechInfo; // NORM or LTRL nmbr Sets the current value of the number processing mode control. The passed value specifies whether the specified speech channel should be in normal number processing mode (NORM) or in literal, digit-by-digit, mode (LTRL). The number mode changes take effect as soon as possible. However, the precise latency is dependent upon the specific speech synthesizer. The API label for this selector is soNumberMode. typedef OSType *speechInfo; // NORM or LTRL rate Sets the speaking rate in words per minute on the specified channel. Speaking rates are fixed-point values. All values are valid; however, specific synthesizers will not necessarily be able to speak at all possible rates. The API label for this selector is soRate. typedef Fixed *speechInfo; pbas Changes the current baseline pitch for the specified channel. The pitch value is a fixed-point integer that conforms to the following frequency relationship: Hertz = 440.0 * 2((BasePitch - 69) / 12) BasePitch of 1.0 ≈ 9 Hertz BasePitch of 39.5 ≈ 80 Hertz BasePitch of 45.8 ≈ 115 Hertz BasePitch of 50.4 ≈ 150 Hertz BasePitch of 100.0 ≈ 2637 Hertz BasePitch values are always positive numbers in the range from 1.0 through 100.0. typedef Fixed *speechInfo; The API label for this selector is soPitchBase. pmod Changes the current pitch modulation range for the speech channel. Modulation values range from 0.0 through 100.0. A value of 0.0 corresponds to no modulation and means the channel will speak in a monotone. Nonzero modulation values correspond to pitch and frequency deviations according to the following formula: Maximum pitch = BasePitch + PitchMod Minimum pitch = BasePitch - PitchMod Maximum Hertz = BaseHertz * 2(+ ModValue / 12) Minimum Hertz = BaseHertz * 2(- ModValue / 12) Given : BasePitch of 46.0 (≈115 Hertz), PitchMod of 2.0, Then: Maximum pitch = 48.0 (≈131 Hertz), Minimum pitch = 46.0 (≈104 Hertz) typedef Fixed *speechInfo; The API label for this selector is soPitchMod. volm Changes the current speaking volume on the specified channel. Volumes are expressed in fixed-point units ranging from 0.0 through 1.0 . A value of 0.0 corresponds to silence, and a value of 1.0 corresponds to the maximum possible volume. Volume units lie on a scale that is linear with amplitude or voltage. A doubling of perceived loudness corresponds to a doubling of the volume. The API label for this selector is soVolume. typedef Fixed *speechInfo; cvox Changes the current voice on the current speech channel to the specified voice. Note that this control call will return an incompatibleVoice error if the specified voice is incompatible with the speech synthesizer associated with the speech channel. The API label for this selector is soCurrentVoice. typedef VoiceSpec *speechInfo; dlim Sets the delimiter character strings for embedded commands. The start of an embedded command is determined by comparing the input characters to the start-command delimiter string. Likewise, the end of a command is determined by comparing the input characters to the end-command delimiter string. Command delimiter strings are either 1 or 2 bytes in length. If a single byte delimiter is desired, it should be followed by a null (0) byte. Delimiter characters must come from the set of printable characters. If the delimiter strings are empty, this will have the effect of disabling embedded command processing. Care must be taken not to choose delimiter strings that might occur naturally in the text to be spoken. The API label for this selector is soCommandDelimiter. typedef DelimiterInfo *speechInfo; typedef struct DelimiterInfo { Byte startDelimiter[2]; // defaults to "[[" Byte endDelimiter[2]; // defaults to "]]" } DelimiterInfo; rset Resets the speech channel to its default states. The speechInfo parameter should be set to 0. Specific synthesizers may provide other reset capabilities. The API label for this selector is soReset. typedef long *speechInfo; myA5 An application uses this selector to request that the speech synthesizer set up an A5 world prior to all callbacks. In order for an application to access any of its global data, it is necessary that register A5 contain the correct value, since all global variables are referenced relative to register A5. If you pass a non-null value in the speechInfo parameter, the speech synthesizer will set register A5 to this value just before it calls one of your callback routines. The A5 register is restored to its original value when your callback routine returns. The API label for this selector is soCurrentA5. typedef Ptr speechInfo; A typical application would make the call to SetSpeechInfo with code like the following: myA5 = SetCurrentA5(); err = SetSpeechInfo (mySpeechChannel, soCurrentA5, myA5); refc Sets the reference constant associated with the specified channel. All callbacks generated for this channel will return this reference constant for use by the application. The application can use this value any way it wants to. The API label for this selector is soRefCon. typedef long *speechInfo; tdcb Enables the callback that signals that text input processing is done. Your callback routine is invoked when the current buffer of input text has been processed and is no longer needed by the speech synthesizer. This callback does not indicate that the synthesizer is finished speaking the text (see the sdcb callback description, next), merely that the input text has been fully processed and is no longer needed by the speech synthesizer. This callback can be disabled by passing a null ProcPtr in the speechInfo parameter. When your callback routine is invoked, you have two options. If you set the nextBuf, byteLen, and controlFlags variables before returning, you will enable the speech synthesizer to continue speaking without any interruption in the output. If you set the nextBuf parameter to null, you are indicating that you have no more text to speak. The controlFlags parameter is defined as in SpeakBuffer. The API label for this selector is soTextDoneCallBack. typedef Ptr speechInfo; pascal void MyInputDoneCallback (SpeechChannel chan, long refCon, Ptr *nextBuf, long *byteLen, long *controlFlags); sdcb Enables an end-of-speech callback. Your callback routine is called whenever an input text stream has been completely processed and spoken. When your callback routine is invoked, you can be certain that the speech channel is now idle and no audio is being generated. This callback can be disabled by passing a null ProcPtr in the speechInfo parameter. The API label for this selector is soSpeechDoneCallBack. typedef Ptr speechInfo; pascal void MyEndOfSpeechCallback (SpeechChannel chan, long refCon); sycb Enables the sync command callback. Your callback routine is invoked when the text following a sync embedded command is about to be spoken. This callback can be disabled by passing a null ProcPtr in the speechInfo parameter. See “Embedded Speech Commands,” later in this document, for a description of how to use sync commands. The API label for this selector is soSyncCallBack. typedef Ptr speechInfo; pascal void MySyncCommandCallback (SpeechChannel chan, long refCon, OSType syncMessage); ercb Enables error callbacks. Your callback routine is called whenever an error occurs during the processing of an input text stream. Errors can result from syntax problems in the input text, insufficient CPU processing speed (such as an audio data underrun), or other conditions that may arise during the speech conversion process. If error callbacks have not been enabled, when an error condition is detected, the Speech Manager will save its value. The error codes can then be read using the GetSpeechInfo status selector soErrors (erro). The error callback can be disabled by passing a null ProcPtr in the speechInfo parameter. The API label for this selector is soErrorCallBack. typedef Ptr speechInfo; pascal void MyErrorCallback (SpeechChannel chan, long refCon, OSErr error, long bytePos); phcb Enables phoneme callbacks. Your callback routine is invoked for each phoneme generated by the speech synthesizer just before the phoneme is actually spoken. This callback can be disabled by passing a null ProcPtr in the speechInfo parameter. The API label for this selector is soPhonemeCallBack. typedef Ptr speechInfo; pascal void MyPhonemeCallBack (SpeechChannel chan, long refCon, short phonemeOpcode); wdcb Enables word callbacks. Your callback routine is invoked for each word generated by the speech synthesizer just before the word is actually spoken. This callback can be disabled by passing a nil ProcPtr in the speechInfo parameter. The API label for this selector is soWordCallBack. typedef Ptr speechInfo; pascal void MyWordCallback (SpeechChannel chan, long refCon, long wordPos, short wordLen); xtnd This call supports a general method for extending the functionality of the Speech Manager. It is used to set synthesizer-specific information. The speechInfo argument should be a pointer to the appropriate data structure. If a particular synthCreator value is not recognized by the synthesizer, the command is ignored and an siUnknownInfoType code is returned. The API label for this selector is soSynthExtension. typedef SpeechXtndData *speechInfo; typedef struct SpeechXtndData { OSType synthCreator; // synth creator ID Byte synthData[2]; // data TBD by synth } SpeechXtndData; RESULT CODESnoErr 0 No error paramErr –50 Parameter value is invalid siUnknownInfoType –231 Feature is not implemented on synthesizer incompatibleVoice –245 Specified voice cannot be used with synthesizer invalidComponentID –3000 Invalid SpeechChannel parameter Application-Defined Pronunciation Dictionaries No matter how sophisticated a speech synthesis system is, there will always be words that it does not automatically pronounce correctly. The clearest instance of words that are often mispronounced is the class of proper names (names of people, place names, and so on). One way to get around this fundamental limitation is to use a dictionary of pronunciations. Whenever a speech synthesizer needs to determine the proper phonemic representation for a particular word, it first looks for the word in its dictionaries. Pronunciation dictionary entries contain information that enables precise conversion between text and the correct phoneme codes. They also provide stress, intonation, and other information to help speech synthesizers produce more natural speech. If the word in question is found in the dictionary, then the synthesizer uses the information from the dictionary entry rather than relying on its own letter-to-sound rules. The use of phonemes is described in “Summary of Phonemes and Prosodic Controls,” later in this document. The Speech Manager word storage format provides high-quality data that is interchangeable between speech synthesizers. The Speech Manager also uses an easily extensible dictionary structure that does not affect the usability of existing dictionaries. It is assumed that application-defined pronunciation dictionaries will reside in RAM when in use. The run-time structure of dictionary data presumably depends on the specific needs of particular speech synthesizers and will therefore differ from the structure of the dictionaries as stored on disk. Associating a Dictionary With a Speech Channel The following routines can be used to associate an application-defined pronunciation dictionary with a particular speech channel. UseDictionary The UseDictionary routine associates a designated dictionary with a specific speech channel. pascal OSErr UseDictionary (SpeechChannel chan, Handle dictionary); Field descriptions chan Specific speech channel dictionary Handle to the specified dictionary DESCRIPTION The speech synthesizer will attempt to use the dictionary data pointed to by the dictionary handle argument to augment the built-in pronunciation rules on the specified speech channel. The synthesizer will use whatever elements of the dictionary resource it considers useful to the speech conversion process. After returning from UseDictionary, the caller is free to release any storage allocated for the dictionary handle. The search order for application-provided dictionaries is last in, first searched. All details of how an application-provided dictionary is represented within the speech synthesizer are dependent on the specific synthesizer implementation and are totally private to the synthesizer. RESULT CODESnoErr 0 No error memFullErr –108 Not enough memory to use new dictionary badDictFormat –246 Format problem with pronunciation dictionary invalidComponentID –3000 Invalid SpeechChannel parameter Pronunciation Dictionary Data Format Each application-defined pronunciation dictionary is implemented as a single resource of type 'dict'. To read the dictionary contents, the system first reads the resource into memory using Resource Manager routines. An application dictionary contains the following information:total byte length (long) (Length is all-inclusive) atom type (long) format version (long) script code (short) language code (short) region code (short) date last modified (long) (Seconds since January 1, 1904) reserved(4) (long) entry count (long) list of entries The currently defined atom type is'dict ' Æ Dictionary Each entry consists of the following:entry byte length (short) (Length is all-inclusive) entry type (short) field count (short) list of fields The currently defined entry types are the following:0x00 Æ Null entry 0x01 to 0x20 Æ Reserved 0x21 Æ Pronunciation entry 0x22 Æ Abbreviation entry Each field consists of the following:field byte length (short) (Length is all-inclusive minus padding) field type (short) field data (char[]) (Data is padded to word boundary) The currently defined field types are the following:0x00 Æ Null field 0x01 to 0x20 Æ Reserved 0x21 Æ Word represented in textual format. 0x22 Æ Phonemic pronunciation including a complete set of syllable, lexical stress, word prominence, and prosodic markers represented in textual format 0x23 Æ Part-of-speech code Creating and Editing Dictionaries There is no built-in support for creating and editing speech dictionaries. You can create dictionary resources using any of the available resource editing tools such as the MPW Rez tool or ResEdit. Of course, you can also fairly easily develop routines to edit the dictionary structure from within the application. At the present time, no assumption should be made that the entries in a dictionary are stored in sorted order. Advanced Voice Information Routines Ordinarily, an application should need to use only the GetVoiceDescription routine to access information about a particular voice. Occasionally, however, it may be necessary to obtain more detailed information by using the GetVoiceInfo routine. GetVoiceInfo The GetVoiceInfo routine returns information about a specified voice channel beyond that obtainable through the GetVoiceDescription routine. pascal OSErr GetVoiceInfo (VoiceSpec *voice, OSType selector, void *voiceInfo); typedef VoiceDescription *voiceInfo; typedef VoiceFileInfo *voiceInfo; typedef struct VoiceFileInfo { FSSpec fileSpec; // vol, dir, name info for voice file short resID; // resource ID of voice in the file } VoiceFileInfo; enum { soVoiceDescription = 'info', // gets basic voice info soVoiceFile = 'fref' // gets voice file ref info }; Field descriptions *voice Specific speech channel selector Used to specify data being requested *voiceInfo Pointer to an information structure DESCRIPTION This function accepts selectors that determine the type of information you want to get. The format of the information returned depends on which value is used in the selector field, as follows: Field descriptions info Gets basic information for the specified voice. The structure returned is functionally equivalent to the VoiceDescription data structure in GetVoiceDescription, described earlier in this document. To maximize compatibility with future versions of the Speech Manager, the application must set the length field of the VoiceDescription structure to the size of the existing record before calling GetVoiceInfo, which then returns the size of the new record. fref Gets file reference information for specified voice; normally only used by speech synthesizers to access voice disk files directly. RESULT CODESnoErr 0 No error memFullErr –108 Not enough memory to load voice into memory voiceNotFound –244 Voice resource not found Embedded Speech Commands This section describes how you can insert commands directly into the input text to control or modify the spoken output. When processing input text data, speech synthesizers look for special sequences of characters called delimiters. These character sequences are usually defined to be unusual pairings of printable characters that would not normally appear in the text. When a begin command delimiter string is encountered in the text, the following characters are assumed to contain one or more commands. The synthesizer will attempt to parse and process these commands until an end command delimiter string is encountered. Embedded Speech Command Syntax By default, the begin command and end command delimiters are defined to be [[ and ]]. The syntax of embedded command blocks is given below, according to these rules: n Items enclosed in angle brackets (< and >) represent logical units that are either defined further below or are atomic units that should be self-explanatory. n Items enclosed in brackets are optional. n Items followed by an ellipsis (…) may be repeated one or more times. n For items separated by a vertical bar (|), any one of the listed items may be used. n Multiple space characters between tokens may be used if desired. n Multiple commands should be separated by semicolons. All other characters that are not enclosed between angle brackets must be entered literally. There is no limit to the number of commands that can be included in a single command block. Identifier Syntax CommandBlock <BeginDelimiter> <CommandList> <EndDelimiter> BeginDelimiter <String1> | <String2> EndDelimiter <String1> | <String2> CommandList <Command> [; <Command>]… Command <CommandSelector> [Parameter]… CommandSelector <OSType> Parameter <OSType> | <String1> | <String2> | <StringN> | <FixedPointValue> |<32BitValue> | <16BitValue> | <8BitValue> String1 <QuoteChar> <Character> <QuoteChar> String2 <QuoteChar> <Character> <Character> <QuoteChar> StringN <QuoteChar> [<Character>]… <QuoteChar> QuoteChar " | ' OSType <4 character pattern (e.g., RATE, vers, aBcD)> Character <Any printable character (example A, b, *, #, x)> FixedPointValue <Decimal number: 0.0000 £ N £ 65535.9999> 32BitValue <OSType> | <LongInt> | <HexLongInt> 16BitValue <Integer> | <HexInteger> 8BitValue <Byte> | <HexByte> LongInt <Decimal number: 0 £ N £ 4294967295> HexLongInt <Hex number: 0x00000000 £ N £ 0xFFFFFFFF> Integer <Decimal number: 0 £ N £ 65535> (continued) HexInteger <Hex number: 0x0000 £ N £ 0xFFFF> Byte <Decimal number: 0 £ N £ 255> HexByte <Hex number: 0x00 £ N £ 0xFF> Here is the embedded command syntax structure: Embedded Speech Command Set Table 1-1 outlines the set of currently defined embedded speech commands. Table 1-1 Embedded speech commands(continued) Command Selector Command syntax and description Version vers vers <Version> Version::= <32BitValue> This command informs the synthesizer of the format version that will be used in subsequent commands. This command is optional but is highly recommended. The current version is 1. Delimiter dlim dlim <BeginDelimiter> <EndDelimiter> The delimiter command specifies the character sequences that mark the beginning and end of all subsequent commands. The new delimiters take effect at the end of the current command block. If the delimiter strings are empty, an error is generated. (Contrast this behavior with the dlim function of SetSpeechInfo.) Comment cmnt cmnt [Character]… This command enables a developer to insert a comment into a text stream for documentation purposes. Note that all characters following the cmnt selector up to the <EndDelimiter> are part of the comment. Reset rset rset <32BitValue> The reset command will reset the speech channel’s settings back to the default values. The parameter should be set to 0. (continued) Baseline pitch pbas pbas [+ | -] <Pitch> Pitch ::= <FixedPointValue> The baseline pitch command changes the current pitch for the speech channel. The pitch value is a fixed-point number in the range 1.0 through 100.0 that conforms to the frequency relationship Hertz = 440.0 * 2((Pitch – 69) / 12) If the pitch number is preceded by a + or – character, the baseline pitch is adjusted relative to its current value. Pitch values are always positive numbers. For further details, see “SetSpeechInfo,” earlier in this document. Pitch modulation pmod pmod [+ | -] <ModulationDepth> ModulationDepth ::= <FixedPointValue> The pitch modulation command changes the modulation range for the speech channel. The modulation value is a fixed-point number in the range 0.0 through 100.0 that conforms to the following pitch and frequency relationships: Maximum pitch = BasePitch + PitchMod Minimum pitch = BasePitch - PitchMod Maximum Hertz = BaseHertz * 2(+ ModValue / 12) Minimum Hertz = BaseHertz * 2(– ModValue / 12) A value of 0.0 corresponds to no modulation and will cause the speech channel to speak in a monotone. If the modulation depth number is preceded by a + or – character, the pitch modulation is adjusted relative to its current value. For further details, see “SetSpeechInfo,” earlier in this document. Speaking rate rate rate [+ | -] <WordsPerMinute> WordsPerMinute ::= <FixedPointValue> The speaking rate command sets the speaking rate in words per minute on the speech channel. If the rate value is preceded by a + or – character, the speaking rate is adjusted relative to its current value. (continued) Volume volm volm [+ | -] <Volume> Volume::= <FixedPointValue> The volume command changes the speaking volume on the speech channel. Volumes are expressed in fixed-point units ranging from 0.0 through 1.0 . A value of 0.0 corresponds to silence, and a value of 1.0 corresponds to the maximum possible volume. Volume units lie on a scale that is linear with amplitude or voltage. A doubling of perceived loudness corresponds to a doubling of the volume. Sync sync sync <SyncMessage> SyncMessage::= <32BitValue> The sync command causes a callback to the application’s sync command callback routine. The callback is made when the audio corresponding to the next word begins to sound. The callback routine is passed the SyncMessage value from the command. If the callback routine has not been defined, the command is ignored. For further details, see “SetSpeechInfo,” earlier in this document. Input mode inpt inpt TX | TEXT | PH | PHON This command switches the input processing mode to either normal text mode or raw phoneme mode. Character mode char char NORM | LTRL The character mode command sets the word speaking mode of the speech synthesizer. When NORM mode is selected, the synthesizer attempts to automatically convert words into speech. This is the most basic function of the text-to-speech synthesizer. When LTRL mode is selected, the synthesizer speaks every word, number, and symbol letter by letter. Embedded command processing continues to function normally, however. Number mode nmbr nmbr NORM | LTRL The number mode command sets the number speaking mode of the speech synthesizer. When NORM mode is selected, the synthesizer attempts to automatically speak numeric strings as intelligently as possible. When LTRL mode is selected, numeric strings are spoken digit by digit. (continued) Silence slnc slnc <Milliseconds> Milliseconds ::= <32BitValue> The silence command causes the synthesizer to generate silence for the specified amount of time. Emphasis emph emph + | - The emphasis command causes the next word to be spoken with either greater emphasis or less emphasis than would normally be used. Using + will force added emphasis, while using – will force reduced emphasis. Synthesizer-Specific xtnd xtnd <SynthCreator> [parameter] SynthCreator ::= <OSType> The extension command enables synthesizer-specific commands to be embedded in the input text stream. The format of the data following SynthCreator is entirely dependent on the synthesizer being used. If a particular SynthCreator is not recognized by the synthesizer, the command is ignored but no error is generated. Synthesizers often support embedded commands that extend the set given in Table 1-1. Embedded Speech Command Error Reporting While embedded speech commands are being processed, several types of errors may be detected and reported to your application. If you have set up an error callback handler with the soErrorCallBack selector of the SetSpeechInfo routine (described earlier), you will be notified once for every error that is detected. If you have not enabled error callbacks, you can still obtain information about the errors encountered by calling GetSpeechInfo with the soErrors selector (also described earlier). The following errors are detected during processing of embedded speech commands:badParmVal –245 Parameter value is invalid badCmdText –246 Embedded command syntax or parameter problem unimplCmd –247 Embedded command is not implemented on synthesizer unimplMsg –248 Raw phoneme text contains invalid characters badVoiceID –250 Specified voice has not been preloaded badParmCount –252 Incorrect number of embedded command arguments found Summary of Phonemes and Prosodic Controls This section summarizes the phonemes and prosodic controls used by American English speech synthesizers. Phoneme Set Table 1-2 summarizes the set of standard phonemes recognized by American English speech synthesizers. In this description, it is assumed that specific rules and markers apply only to general American English. Other languages and dialects require different phoneme inventories. Phonemes divide into two groups: vowels and consonants. All vowel symbols are uppercase pairs of letters. For consonants, in cases in which the correspondence between the consonant and its symbol is apparent, the symbol is that lowerrcase consonant; in other cases, the symbol is an uppercase consonant. Within the example words, the individual sounds being exemplified appear in bold face. Table 1-2 American English phoneme symbols(continued) Symbol Example Opcode Symbol Example Opcode AE bat 2 b bin 18 EY bait 3 C chin 19 AO caught 4 d din 20 AX about 5 D them 21 IY beet 6 f fin 22 EH bet 7 g gain 23 IH bit 8 h hat 24 AY bite 9 J gin 25 IX roses 10 k kin 26 AA cot 11 l limb 27 UW boot 12 m mat 28 UH book 13 n nat 29 UX bud 14 N tang 30 OW boat 15 p pin 31 AW bout 16 r ran 32 OY boy 17 s sin 33 S shin 34 t tin 35 T thin 36 v van 37 w wet 38 (continued) y yet 39 % silence 0 z zen 40 @ breath intake 1 Z genre 41 Note The “silence” phoneme (%) and the “breath” phoneme (@) may be lengthened or shortened like any other phoneme.u Prosodic Controls The symbols listed in Table 1-3 are recognized as modifiers to the basic phonemes described in the preceding section. They can be used to more precisely control the quality of speech that is described in terms of raw phonemes. Table 1-3 Prosodic control symbols(continued) Type Symbol Description of effect Lexical stress: Marks stress within a word Primary stress 1 anticipation AEnt2IHsIXp1EYSAXn (“anticipation”) Secondary stress 2 anticipation Syllable breaks: Marks syllable breaks within a word Syllable mark = (equal) AEn=t2IH=sIX=p1EY=SAXn (“anticipation”)Marks the beginning of a word (required) Word prominence: Unstressed ~ (asciitilde) Used for words with minimal information content Normal stress _ (underscore) Used for information-bearing words Emphatic stress + (plus) special emphasis for a wordPlaced before the affected phonemepitch will rise on the following phoneme Prosodic Pitch rise / (slash) Pitch fall \ (backslash) pitch will fall on the following phoneme Lengthen phoneme > (greater) lengthen the duration of the following phoneme Shorten phoneme < (less) shorten the duration of the following phoneme (continued) Punctuation: Pitch effect Timing effect . (period) Sentence final fall Pause follows ? (question) Sentence final rise Pause follows ! (exclam) Sentence final sharp fall Pause follows … (ellipsis) Clause final level Pause follows , (comma) Continuation rise Short pause follows ; (semicolon) Continuation rise Short pause follows : (colon) Clause final level Short pause follows ( (parenleft) Start reduced range Short pause precedes ) (parenright) End reduced range Short pause follows “ ‘ (quotedblleft, quotesingleleft) Varies Varies ” ’ (quotedblright, quotesingleright) Varies Varies - (hyphen) Clause-final level Short pause follows & (ampersand) Forces no addition of silence between phonemes Specific pitch contours associated with these punctuation marks may vary according to other considerations in the analysis of the text, such as whether a question is rhetorical or begins with a wh question word, so the above effects should be regarded only as guidelines and not absolute. This also applies to the timing effects, which will vary according to the current rate setting. The prosodic control symbols (/, \, <, and >) may be concatenated to provide more exaggerated, cumulative effects. The specific nature of the effect is dependent on the speech synthesizer. Speech synthesizers also often extend or enhance the controls described in this section. Summary of the Speech Manager Constants #define gestaltSpeechAttr 'ttsc' // Gestalt Manager selector for speech attributes enum { gestaltSpeechMgrPresent = 0 // Gestalt bit that indicates that Speech Manager exists }; #define kTextToSpeechSynthType 'ttsc' // text-to-speech synthesizer component type #define kTextToSpeechVoiceType 'ttvd' // text-to-speech voice resource type #define kTextToSpeechVoiceFileType 'ttvf' // text-to-speech voice file type #define kTextToSpeechVoiceBundleType 'ttvb' // text-to-speech voice bundle file type enum { // Speech Manager error codes (range from 240 - 259) noSynthFound = -240, synthOpenFailed = -241, synthNotReady = -242, bufTooSmall = -243, voiceNotFound = -244, incompatibleVoice = -245, badDictFormat = -246, badPhonemeText = -247 }; enum { // constants for SpeakBuffer and text done callback controlFlags bits kNoEndingProsody = 1, kNoSpeechInterrupt = 2, kPreflightThenPause = 4 }; enum { // constants for StopSpeechAt and PauseSpeechAt kImmediate = 0, kEndOfWord = 1, kEndOfSentence = 2 }; // GetSpeechInfo & SetSpeechInfo selectors #define soStatus 'stat' #define soErrors 'erro' #define soInputMode 'inpt' #define soCharacterMode 'char' #define soNumberMode 'nmbr' #define soRate 'rate' #define soPitchBase 'pbas' #define soPitchMod 'pmod' #define soVolume 'volm' #define soSynthType 'vers' #define soRecentSync 'sync' #define soPhonemeSymbols 'phsy' #define soCurrentVoice 'cvox' #define soCommandDelimiter 'dlim' #define soReset 'rset' #define soCurrentA5 'myA5' #define soRefCon 'refc' #define soTextDoneCallBack 'tdcb' #define soSpeechDoneCallBack 'sdcb' #define soSyncCallBack 'sycb' #define soErrorCallBack 'ercb' #define soPhonemeCallBack 'phcb' #define soWordCallBack 'wdcb' #define soSynthExtension 'xtnd' // speaking mode constants #define modeText 'TEXT' // input mode constants #define modeTX 'TX' #define modePhonemes 'PHON' #define modePH 'PH' #define modeNormal 'NORM' // character mode and number mode constants #define modeLiteral 'LTRL' enum { // GetVoiceInfo selectors soVoiceDescription = 'info', // gets basic voice info soVoiceFile = 'fref' // gets voice file ref info }; enum {kNeuter = 0, kMale, kFemale}; // returned in gender field below Data Types typedef struct SpeechChannelRecord { long data[1]; } SpeechChannelRecord; typedef SpeechChannelRecord *SpeechChannel; typedef struct VoiceSpec { OSType creator; // creator ID of required synthesizer OSType id; // voice ID on the specified synth } VoiceSpec; typedef struct VoiceDescription { long length; // size of structure - set by application VoiceSpec voice; // voice creator and ID info long version; // version code for voice Str63 name; // name of voice Str255 comment; // additional text info about voice short gender; // neuter, male, or female short age; // approximate age in years short script; // script code of text voice can process short language; // language code of voice output short region; // region code of voice output long reserved[4]; // reserved for future use } VoiceDescription; typedef struct VoiceFileInfo { FSSpec fileSpec; // volume, dir, & name information for voice file short resID; // resource ID of voice in the file } VoiceFileInfo; typedef struct SpeechStatusInfo { Boolean outputBusy; // true if audio is playing Boolean outputPaused; // true if channel is paused long inputBytesLeft; // bytes left to process short phonemeCode; // opcode for cur phoneme } SpeechStatusInfo; typedef struct SpeechErrorInfo { short count; // # of errs since last check OSErr oldest; // oldest unread error long oldPos; // char position of oldest err OSErr newest; // most recent error long newPos; // char position of newest err } SpeechErrorInfo; typedef struct SpeechVersionInfo { OSType synthType; // always 'ttsc' OSType synthSubType; // synth flavor OSType synthManufacturer; // synth creator ID long synthFlags; // synth feature flags NumVersion synthVersion; // synth version number } SpeechVersionInfo; typedef struct PhonemeInfo { short opcode; // opcode for the phoneme Str15 phStr; // corresponding char string Str31 exampleStr; // word that shows use of phoneme short hiliteStart; // segment of example word that short hiliteEnd; // hilighted text (ala TextEdit) } PhonemeInfo; typedef struct PhonemeDescriptor { short phonemeCount; // # of elements PhonemeInfo thePhonemes[1]; // element list } PhonemeDescriptor; typedef struct SpeechXtndData { OSType synthCreator; // synth creator ID Byte synthData[2]; // data TBD by synth } SpeechXtndData; typedef struct DelimiterInfo { Byte startDelimiter[2]; // defaults to [[ Byte endDelimiter[2]; // defaults to ]] } DelimiterInfo; Voice Routines pascal OSErr MakeVoiceSpec (OSType creator, OSType id, VoiceSpec *voice); pascal OSErr CountVoices (short *numVoices); pascal OSErr GetIndVoice (short index, VoiceSpec *voice); pascal OSErr GetVoiceDescription (VoiceSpec *voice, VoiceDescription *info, long infoLength); pascal OSErr GetVoiceInfo (VoiceSpec *voice, OSType selector, void *voiceInfo); Routines for Managing Speech Channels pascal OSErr NewSpeechChannel (VoiceSpec *voice, SpeechChannel *chan); pascal OSErr DisposeSpeechChannel (SpeechChannel chan); Speaking Routines pascal OSErr SpeakString (StringPtr s); pascal OSErr SpeakText (SpeechChannel chan, Ptr textBuf, long textBytes); pascal OSErr StopSpeech (SpeechChannel chan); pascal OSErr StopSpeechAt (SpeechChannel chan, long whereToStop); pascal OSErr PauseSpeechAt (SpeechChannel chan, long whereToPause); pascal OSErr ContinueSpeech (SpeechChannel chan); pascal OSErr SpeakBuffer (SpeechChannel chan, Ptr textBuf, long textBytes, long controlFlags); Information and Control Routines pascal NumVersion SpeechManagerVersion (void); pascal short SpeechBusy (void); pascal OSErr SetSpeechRate (SpeechChannel chan, Fixed rate); pascal OSErr GetSpeechRate (SpeechChannel chan, Fixed *rate); pascal OSErr SetSpeechPitch (SpeechChannel chan, Fixed pitch); pascal OSErr GetSpeechPitch (SpeechChannel chan, Fixed *pitch); pascal short SpeechBusySystemWide (void); pascal OSErr SetSpeechInfo (SpeechChannel chan, OSType selector, void *speechInfo); pascal OSErr GetSpeechInfo (SpeechChannel chan, OSType selector, void *speechInfo); Text-to-Phoneme Conversion Routine pascal OSErr TextToPhonemes (SpeechChannel chan, Ptr textBuf, long textBytes, Handle phonemeBuf, long *phonemeBytes) Dictionary Management Routine pascal OSErr UseDictionary (SpeechChannel chan, Handle dictionary) Callback Prototypes // text-done callback routine typedef typedef pascal void (*TextDoneProcPtr) (SpeechChannel, long, Ptr *, long *, long *); // speech-done callback routine typedef typedef pascal void (*SpeechDoneProcPtr) (SpeechChannel, long ); // sync callback routine typedef typedef pascal void (*SyncProcPtr) (SpeechChannel, long, OSType); // error callback routine typedef typedef pascal void (*ErrorProcPtr) (SpeechChannel, long, OSErr, long); // phoneme callback routine typedef typedef pascal void (*PhonemeProcPtr) (SpeechChannel, long, short); // word callback routine typedef typedef pascal void (*WordProcPtr) (SpeechChannel, long, long, short); Error Return CodesnoErr 0 No error paramErr –50 Parameter error memFullErr –108 Not enough memory to speak nilHandleErr –109 Handle argument is nil siUnknownInfoType –231 Feature not implemented on synthesizer noSynthFound –240 Could not find the specified speech synthesizer synthOpenFailed –241 Could not open another speech synthesizer channel synthNotReady –242 Speech synthesizer is still busy speaking bufTooSmall –243 Output buffer is too small to hold result voiceNotFound –244 Voice resource not found incompatibleVoice –245 Specified voice cannot be used with synthesizer badDictFormat –246 Format problem with pronunciation dictionary badPhonemeText –247 Raw phoneme text contains invalid characters invalidComponentID –3000 Invalid SpeechChannel parameter